Emotion type classification for interactive dialog system

ABSTRACT

Techniques for selecting an emotion type code associated with semantic content in an interactive dialog system. In an aspect, fact or profile inputs are provided to an emotion classification algorithm, which selects an emotion type based on the specific combination of fact or profile inputs. The emotion classification algorithm may be rules-based or derived from machine learning. A previous user input may be further specified as input to the emotion classification algorithm. The techniques are especially applicable in mobile communications devices such as smartphones, wherein the fact or profile inputs may be derived from usage of the diverse function set of the device, including online access, text or voice communications, scheduling functions, etc.

BACKGROUND

Artificial interactive dialog systems are an increasingly widespreadfeature in state-of-the-art consumer electronic devices. For example,modern wireless smartphones incorporate speech recognition, interactivedialog, and speech synthesis software to engage in real-time interactiveconversation with a user to deliver such services as information andnews, remote device configuration and programming, conversationalrapport, etc.

To allow the user to experience a more natural and seamless conversationwith the dialog system, it is desirable to generate speech or otheroutput having emotional content in addition to semantic content. Forexample, when delivering news, scheduling tasks, or otherwiseinteracting with the user, it would be desirable to impart emotionalcharacteristics to the synthesized speech and/or other output to moreeffectively engage the user in conversation.

Accordingly, it is desirable to provide techniques for determiningsuitable emotions to impart to semantic content delivered by aninteractive dialog system, and classifying such determined emotionsaccording to one of a plurality of predetermined emotion types.

SUMMARY

This Summary is provided to introduce a selection of concepts in asimplified form that are further described below in the DetailedDescription. This Summary is not intended to identify key features oressential features of the claimed subject matter, nor is it intended tobe used to limit the scope of the claimed subject matter.

Briefly, various aspects of the subject matter described herein aredirected towards techniques for providing an apparatus for aninteractive dialog system. In an aspect, fact or profile inputsavailable to a mobile communications device may be combined withprevious or current user input to select an appropriate emotion typecode to associate with an output statement generated by the interactivedialog system. The fact or profile inputs may be derived from certainaspects of the device usage, e.g., user online activity, usercommunications, calendar and scheduling functions, etc. The algorithmsfor selecting the emotion type code may be rules-based, orpre-configured using machine learning techniques. The emotion type codemay be combined with the output statement to generate synthesized speechhaving emotional characteristics for an improved user experience.

Other advantages may become apparent from the following detaileddescription and drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 illustrates a scenario employing a mobile communications devicewherein techniques of the present disclosure may be applied.

FIG. 2 illustrates an exemplary embodiment of processing that may beperformed by processor and other elements of device.

FIG. 3 illustrates an exemplary embodiment of processing performed by adialog engine.

FIG. 4 illustrates an exemplary embodiment of an emotion typeclassification block according to the present disclosure.

FIG. 5 illustrates an exemplary embodiment of a hybrid emotion typeclassification algorithm.

FIG. 6 illustrates an exemplary embodiment of a rules-based algorithm.

FIG. 7 illustrates an alternative exemplary embodiment of a rules-basedalgorithm.

FIG. 8 illustrates an exemplary embodiment of a training scheme forderiving a trained algorithm for selecting emotion type.

FIG. 9 illustrates an exemplary embodiment of a method according to thepresent disclosure.

FIG. 10 schematically shows a non-limiting computing system that mayperform one or more of the above described methods and processes.

FIG. 11 illustrates an exemplary embodiment of an apparatus according tothe present disclosure.

FIG. 12 illustrates an exemplary embodiment wherein techniques of thepresent disclosure are incorporated in a dialog system with emotionalcontent imparted to displayed text, rather than or in addition toaudible speech.

DETAILED DESCRIPTION

Various aspects of the technology described herein are generallydirected towards a technology for selecting an emotion type codeassociated with an output statement in an electronic interactive dialogsystem. The detailed description set forth below in connection with theappended drawings is intended as a description of exemplary aspects ofthe invention and is not intended to represent the only exemplaryaspects in which the invention can be practiced. The term “exemplary”used throughout this description means “serving as an example, instance,or illustration,” and should not necessarily be construed as preferredor advantageous over other exemplary aspects. The detailed descriptionincludes specific details for the purpose of providing a thoroughunderstanding of the exemplary aspects of the invention. It will beapparent to those skilled in the art that the exemplary aspects of theinvention may be practiced without these specific details. In someinstances, well-known structures and devices are shown in block diagramform in order to avoid obscuring the novelty of the exemplary aspectspresented herein.

FIG. 1 illustrates a scenario employing a mobile communications device120 wherein techniques of the present disclosure may be applied. NoteFIG. 1 is shown for illustrative purposes only, and is not meant tolimit the scope of the present disclosure to only applications of thepresent disclosure to mobile communications devices. For example,techniques described herein may readily be applied in other devices andsystems, e.g., in the human interface systems of notebook and desktopcomputers, automobile navigation systems, etc. Such alternativeapplications are contemplated to be within the scope of the presentdisclosure.

In FIG. 1, user 110 communicates with mobile communications device 120,e.g., a handheld smartphone. A smartphone may be understood to includeany mobile device integrating communications functions such as voicecalling and Internet access with a relatively sophisticatedmicroprocessor for implementing a diverse array of computational tasks.User 110 may provide speech input 122 to microphone 124 on device 120.One or more processors 125 within device 120, and/or processors (notshown) available over a network (e.g., implementing a cloud computingscheme) may process the speech signal received by microphone 124, e.g.,performing functions as further described with reference to FIG. 2hereinbelow. Note processor 125 need not have any particular form,shape, or functional partitioning such as described herein for exemplarypurposes only, and such processors may generally be implemented using avariety of techniques known in the art.

Based on processing performed by processor 125, device 120 may generatespeech output 126 responsive to speech input 122 using audio speaker128. In certain scenarios, device 120 may also generate speech output126 independently of speech input 122, e.g., device 120 may autonomouslyprovide alerts or relay messages from other users (not shown) to user110 in the form of speech output 126. In an exemplary embodiment, outputresponsive to speech input 122 may also be displayed on display 129 ofdevice 120, e.g., as text, graphics, animation, etc.

FIG. 2 illustrates an exemplary embodiment of an interactive dialogsystem 200 that may be implemented by processor 125 and other elementsof device 120. Note the processing shown in FIG. 2 is for illustrativepurposes only, and is not meant to restrict the scope of the presentdisclosure to any particular sequence or set of operations shown in FIG.2. For example, in alternative exemplary embodiments, certain techniquesdisclosed herein for selecting an emotion type code may be appliedindependently of the processing shown in FIG. 2. Furthermore, one ormore blocks shown in FIG. 2 may be combined or omitted depending onspecific functional partitioning in the system, and therefore FIG. 2 isnot meant to suggest any functional dependence or independence of theblocks shown. Such alternative exemplary embodiments are contemplated tobe within the scope of the present disclosure.

In FIG. 2, at block 210, speech input is received. Speech input 210 maycorrespond to a waveform representation of an acoustic signal derivedfrom, e.g., microphone 124 on device 120. The output 210 a of speechinput 210 may correspond to a digitized version of the acoustic waveformcontaining speech content.

At block 220, speech recognition is performed on output 210 a. In anexemplary embodiment, speech recognition 220 translates speech such aspresent in output 210 a into text. The output 220 a of speechrecognition 220 may accordingly correspond to a textual representationof speech present in the digitized acoustic waveform output 210 a. Forexample, if output 210 a includes an audio waveform representation of ahuman utterance such as “What is the weather tomorrow?” e.g., as pickedup by microphone 124, then speech recognition 220 may output ASCII text(or other text representation) corresponding to the text “What is theweather tomorrow?” based on its speech recognition capabilities. Speechrecognition as performed by block 220 may be performed using acousticmodeling and language modeling techniques including, e.g., Hidden MarkovModels (HMM's), neural networks, etc.

At block 230, language understanding is performed on the output 220 a ofspeech recognition 220, based on knowledge of the expected naturallanguage of output 210 a. In an exemplary embodiment, natural languageunderstanding techniques such as parsing and grammatical analysis may beperformed using knowledge of, e.g., morphology and syntax, to derive theintended meaning of the text in output 220 a. The output 230 a oflanguage understanding 230 may include a formal representation of thesemantic and/or emotional content of the speech present in output 220 a.

At block 240, a dialog engine generates a suitable response to thespeech as determined from output 230 a. For example, if languageunderstanding 230 determines that the user speech input corresponds to aquery regarding the weather for a particular geography, then dialogengine 240 may obtain and assemble the requisite weather informationfrom sources, e.g., a weather forecast service or database. For example,retrieved weather information may correspond to time/date code for theweather forecast, a weather type code corresponding to “sunny” weather,and a temperature field indicating an average temperature of 72 degrees.

In an exemplary embodiment, dialog engine 240 may further “package” theretrieved information so that it may be presented for readycomprehension by the user. Accordingly, the semantic content output 240a of dialog engine 240 may correspond to a representation of thesemantic content such as “today's weather sunny; temperature 72degrees.”

In addition to semantic content 240 a, dialog engine 240 may furthergenerate an emotion type code 240 b associated with semantic content 240a. Emotion type code 240 b may indicate a specific type of emotionalcontent to impart to semantic content 240 a when delivered to the useras output speech. For example, if the user is planning to picnic on acertain day, then a sunny weather forecast may be simultaneouslydelivered with an emotionally upbeat tone of voice. In this case,emotion type code 240 b may refer to an emotional content typecorresponding to “moderate happiness.” Techniques for generating theemotion type code 240 b based on data, facts, and inputs available tothe interactive dialog system 200 will be further described hereinbelow,e.g., with reference to FIG. 3.

At block 250, language generation is performed on the outputs 240 a, 240b of dialog engine 240. Language generation presents the output ofdialog engine 240 in a natural language format, e.g., as sentences in atarget language obeying lexical and grammatical rules, for readycomprehension by a human user. For example, based on the semanticcontent 240 a, language generation 250 may generate the followingstatement: “The weather today will be 72 degrees and sunny.”

In an exemplary embodiment, block 250 may further accept input 255 afrom system personality block 255. System personality block 255 mayspecify default parameters 255 a for the dialog engine according to apre-selected “personality” for the interactive dialog system. Forexample, if the system personality is chosen to be “male” or “female,”or “cheerful” or “thoughtful,” then block 255 may specify parameterscorresponding to the system personality as reference input 255 a. Notein certain exemplary embodiments, block 255 may be omitted, or itsfunctionality may be incorporated in other blocks, e.g., dialog engine240 or language generation block 250, and such alternative exemplaryembodiments are contemplated to be within the scope of the presentdisclosure.

In an exemplary embodiment, language generation block 250 may combinesemantic content 240 a, emotion type code 240 b, and default emotionalparameters 255 a to synthesize an output statement 250 a. For example,an emotion type code 240 b corresponding to “moderate happiness” maycause block 250 to generate a natural language (e.g., English) sentencesuch as “Great news—the weather today will be 72 degrees and sunny!”Output statement 250 a of language generation block 250 is provided tothe subsequent text-to-speech block 260 to generate audio speechcorresponding to the output statement 250 a.

Note in certain exemplary embodiments, some functionality of thelanguage generation block 250 described hereinabove may be omitted. Forexample, language generation block 250 need not specifically account foremotion type code 240 b in generating output statement 250 a, andtext-to-speech block 260 (which also has access to emotion type code 240b) may instead be relied upon to provide the full emotional content ofthe synthesized speech output. Furthermore, in certain instances whereinformation retrieved by dialog engine is already in a natural languageformat, then language generation block 250 may effectively be bypassed.For example, an Internet weather service accessed by dialog engine 240may provide weather updates directly in a natural language such asEnglish, so that language generation 250 may not need to do anysubstantial post-processing on the semantic content 240 a. Suchalternative exemplary embodiments are contemplated to be within thescope of the present disclosure.

At block 260, text-to-speech conversion is performed on output 250 a oflanguage generation 250. In an exemplary embodiment, emotion type code240 b is also provided to TTS block 260 to synthesize speech having textcontent corresponding to 250 a and emotional content corresponding toemotion type code 240 b. The output of text-to-speech conversion 260 maybe an audio waveform.

At block 270, an acoustic output is generated from the output oftext-to-speech conversion 260. The speech output may be provided to alistener, e.g., user 110 in FIG. 1, by speaker 128 of device 120.

As interactive dialog systems become increasingly sophisticated, itwould be desirable to provide techniques for effectively selectingsuitable emotion type codes for speech and other types of outputgenerated by such systems. For example, as suggested by the provision ofemotion type code 240 b along with semantic content 240 a, in certainapplications it is desirable for speech output 270 to be generated notonly as an emotionally neutral rendition of text, but also toincorporate a pre-specified emotional content when delivered to thelistener. Thus the output statement 250 a may be associated with asuitable emotion type code 240 b such that user 110 will perceive anappropriate emotional content to be present in speech output 270.

For example, if dialog engine 240 specifies that semantic content 240 acorresponds to information that a certain baseball team has won theWorld Series, and user 110 is further a fan of that baseball team, thenchoosing emotion type code 240 b to represent “excited” (as opposed to,e.g., neutral or unhappy) to match the user's emotional state wouldlikely result in a more satisfying interactive experience for user 110.

FIG. 3 illustrates an exemplary embodiment 240.1 of processing performedby dialog engine 240 to generate appropriate semantic content as well asan associated emotion type code. Note FIG. 3 is shown for illustrativepurposes only, and is not meant to limit the scope of the presentdisclosure to any particular application of the techniques describedherein.

In FIG. 3, dialog engine 240.1 includes semantic content generationblock 310 and an emotion type classification block 320, also referred toherein as a “classification block.” Both blocks 310 and 320 are providedwith user dialog input 230 a, which may include the output of languageunderstanding 230 performed on one or more statements or queries by user110 in the current or any previous dialog session. In particular,semantic content generation block 310 generates semantic content 240.1 acorresponding to information to be delivered to user, while emotion typeclassification block 320 generates an appropriate emotion type,represented by emotion type code 240.1 b, to be imparted to semanticcontent 240.1 a. Note user dialog input 230 a may be understood toinclude any or all of user inputs from current or previous dialogsessions, e.g., as stored in history files on a local device memory,etc.

In addition to user dialog input 230 a, block 320 is further providedwith “fact or profile” inputs 301, which may include parameters derivedfrom usage of the device on which the dialog engine 240.1 isimplemented. Emotion type classification block 320 may generate theappropriate emotion type code 240.1 b based on the combination of factor profile inputs 301 and user dialog input 230 a according to one ormore algorithms, e.g., with parameters trained off-line according tomachine learning techniques further disclosed hereinbelow. In anexemplary embodiment, emotion type code 240.1 b may include aspecification of both the emotion (e.g., “happy,” etc.) as well as adegree indicator indicating the degree to which that emotion isexhibited (e.g., a number from 1-5, with 5 indicating “very happy”). Inan exemplary embodiment, emotion type code 240.1 b may be expressed in aformat such as specified in an Emotion Markup Language (EmotionML) forspecifying one of a plurality of predetermined emotion types that may beimparted to the output speech.

It is noted that a current trend is for modern consumer devices such assmartphones to increasingly take on the role of indispensable personalassistants, integrating diverse feature sets into a single mobile devicecarried by the user frequently, and often continuously. The repeated useof such a device by a single user for a wide variety of purposes (e.g.,voice communications, Internet access, schedule planning, recreation,etc.) allows potential access by interactive dialog system 200 to agreat deal of relevant data for selecting emotion type code 240.1 b. Forexample, if location services are enabled for a smartphone, then dataregarding the user's geographical locale over a period of time may beused to infer certain of the user's geographical preferences, e.g.,being a fan of a local sports team, or propensity for trying newrestaurants in a certain area, etc. Other examples of usage scenariosgenerating relevant data include, but are not limited to, accessing theInternet using a smartphone to perform topic or keyword searches,scheduling calendar dates or appointments, setting up user profilesduring device initialization, etc. Such data may be collectivelyutilized by a dialog system to assess an appropriate emotion type code240.1 b to impart to semantic content 240.1 a during an interactivedialog session with user 110. In view of such usage scenarios, it isespecially advantageous to derive at least one or even multiple fact orprofile input 301 from the usage of a mobile communications deviceimplementing the interactive dialog system.

FIG. 4 illustrates an exemplary embodiment 320.1 of an emotion typeclassification block according to the present disclosure. In FIG. 4,exemplary fact or profile inputs 301.1 obtainable by device 120 includea plurality of fact or profile parameters 402-422 selected by a systemdesigner as relevant to the task of emotion type classification. Noteexemplary fact or profile inputs 301.1 are given for illustrativepurposes only. In alternative exemplary embodiments, any of theindividual parameters of fact or profile inputs 301.1 may be omitted,and/or other parameters not shown in FIG. 4 may be added. The parameters402-422 need not describe disjoint classes of parameters, i.e., a singletype of input used by emotion type classification block 320.1 maysimultaneously fall into two or more categories of the inputs 402-422.Such alternative exemplary embodiments are contemplated to be within thescope of the present disclosure.

User configuration 402 includes information directly input by user 110to device 120 that aids in emotion type classification. In an exemplaryembodiment, during set-up of device 120, or generally during operationof device 120, user 110 may be asked to answer a series of profilequestions. For example, user 110 may be queried regarding age andgender, hobbies, interests, favorite movies, sports, personality traits,etc. In some instances, information regarding a user's personalitytraits (e.g., extrovert or introvert, dominant or submissive, etc.) maybe inferred by asking questions from personality profile questionnaires.Information from user configuration 402 may be stored for later use byemotion type classification block 320.1 for selecting emotion type code240.1 b.

User online activity 404 includes Internet usage statistics and/orcontent of data transmitted to and from the Internet or other networksvia device 120. In an exemplary embodiment, online activity 404 mayinclude user search queries, e.g., as submitted to a Web search enginevia device 120. The contents of user search queries may be noted, aswell as other statistics such as frequency and/or timing of similarqueries, etc. In an exemplary embodiment, online activity 404 mayfurther include identities of frequently accessed websites, contents ofe-mail messages, postings to social media websites, etc.

User communications 406 includes text or voice communications conductedusing device 120. Such communications may include, e.g., text messagessent via short messaging service (SMS), voice calls over the wirelessnetwork, etc. User communications 406 may also include messaging onnative or third-party social media networks, e.g., Internet websitesaccessed by user 110 using device 120, or instant messaging or chattingapplications, etc.

User location 408 may include records of user location available todevice 120, e.g., via wireless communications with one or more cellularbase stations, or Internet-based location services, if such services areenabled. User location 408 may further specify a location context of theuser, e.g., if the user is at home or at work, in a car, in a crowdedenvironment, in a meeting, etc.

Calendar/scheduling functions/local date and time 410 may include timeinformation as relevant to emotion classification based on the scheduleof a user's activities. For example, such information may be premised onuse of device 120 by user 110 as a personal scheduling organizer. In anexemplary embodiment, whether a time segment on a user's calendar isavailable or unavailable may be relevant to classification of emotiontype. Furthermore, the nature of an upcoming appointment, e.g., ascheduled vacation or important business meeting, may also be relevant.

Calendar/scheduling functions/local date and time 410 may furtherincorporate information such as whether a certain time overlaps withworking hours for the user, or whether the current date corresponds to aweekend, etc.

User emotional state 412 includes data related to determination of auser's real-time emotional state. Such data may include the content ofthe user's utterances to the dialog system, as well as voice parameters,physiological signals, etc. Emotion-recognition technology may furtherbe utilized inferring a user's emotions by sensing, e.g., user speech,facial expression, recent text messages communicated to and from device120, physiological signs including body temperature and heart rate,etc., as sensed by various sensors (e.g., physical sensor inputs 420) ondevice 120.

Device usage statistics 414 includes information concerning howfrequently user 110 uses device 120, how long the user has used device120, for what purposes, etc. In an exemplary embodiment, the times andfrequency of user interactions with device 120 throughout the day may berecorded, as well as the applications used, or websites visited, duringthose interactions.

Online information resources 416 may include news or events related to auser's interests, as obtained from online information sources. Forexample, based on a determination that user 110 is a fan of a sportsteam, then online information resources 416 may include news that thatsports team has recently won a game. Alternatively, if user 110 isdetermined to have a preference for a certain type of cuisine, forexample, then online information resources 416 may include news that anew restaurant of that type has just opened near the user's home.

Digital assistant (DA) personality 418 may specify a personality profilefor the dialog system, so that interaction with the dialog system by theuser more closely mimics interaction with a human assistant. The DApersonality profile may specify, e.g., whether the DA is an extrovert orintrovert, dominant or submissive, or the gender of the DA. For example,DA personality 418 may specify a profile corresponding to a female,cheerful personality, for the digital assistant. Note this feature maybe provided alternatively, or in conjunction with, system personalityblock 255 as described hereinabove with reference to FIG. 2.

Physical sensor inputs 420 may include signals derived from sensors ondevice 120 for sensing physical parameters of the device 120. Forexample, physical sensor inputs 420 may include sensor signals fromaccelerometers and/or gyroscopes in device 120, e.g., to determine ifuser 110 is currently walking or in a car, etc. Knowledge of a user'scurrent mobility situation may provide information to emotion typeclassification block 320.1 aiding in generating an appropriate emotionalresponse. Physical sensor inputs 420 may also include sensor signalsfrom microphones or other acoustic recording devices on device 120,e.g., to infer characteristics of the environment based on thebackground noise, etc.

Conversation history 422 may include any records of present and pastconversations between the user and the digital assistant.

Fact or profile inputs 301.1, along with user dialog input 230 a, may beprovided as input to emotion type classification algorithm 450 ofemotion type classification block 320.1. Emotion type classificationalgorithm 450 may map the multi-dimensional vector specified by thespecific fact or profile inputs 301.1 and user dialog input 230 a to aspecific output determination of emotion type code 240.1 b, e.g.,specifying an appropriate emotion type and corresponding degree of thatemotion.

FIG. 5 illustrates an exemplary embodiment 450.1 of a hybrid emotiontype classification algorithm. Note FIG. 5 is shown for illustrativepurposes only, and is not meant to limit the scope of the presentdisclosure to any particular type of algorithm shown.

In FIG. 5, emotion type classification algorithm 450.1 includesalgorithm selection block 510 for choosing at least one algorithm to beused for selecting emotion type. In an exemplary embodiment, the atleast one algorithm includes rules-based algorithms 512 and trainedalgorithms 514. Rules-based algorithms 512 may correspond to algorithmsspecified by designers of the dialog system, and may generally be basedon fundamental rationales as discerned by the designers for assigning agiven emotion type to particular scenarios, facts, profiles, and/or userdialog inputs. Trained algorithms 514, on the other hand, may correspondto algorithms whose parameters and functional mappings are derived,e.g., offline, from large sets of training data. It will be appreciatedthat the inter-relationships between inputs and outputs in trainedalgorithms 514 may be less transparent to the system designer than inrules-based algorithms 512, and trained algorithms 514 may generallycapture more intricate inter-dependencies amongst the variables asdetermined from algorithm training.

As seen in FIG. 5, both rules-based algorithms 512 and trainedalgorithms 514 may accept as inputs the fact or profile inputs 301.1 anduser dialog input 230 a. Algorithm selection block 510 may select anappropriate one of algorithms 512 or 514 to use for selecting emotiontype code 240.1 b in any instance. For example, in response to fact orprofile inputs 301.1 and/or user dialog input 230 a corresponding to apre-determined set of values, selection block 510 may choose toimplement a particular rules-based algorithm 512 instead of trainedalgorithm 514, or vice versa. In an exemplary embodiment, rules-basedalgorithms 512 may be preferred in certain cases over trained algorithms514, e.g., if their design based on fundamental rationales may result inmore accurate classification of emotion type in certain instances.Rules-based algorithms 512 may also be preferred in certain scenarioswherein, e.g., not enough training data is available to design a certaintype of trained algorithm 514. In an exemplary embodiment, rules-basedalgorithms 512 may be chosen when it is relatively straightforward for adesigner to derive an expected response based on a particular set ofinputs.

FIG. 6 illustrates an exemplary embodiment 600 of a rules-basedalgorithm. Note FIG. 6 is shown for illustrative purposes only, and isnot meant to limit the scope of the present disclosure to rules-basedalgorithms, to any particular implementation of rules-based algorithms,or to any particular format or content for the fact or profile inputs301.1 or emotion types 240 b shown.

In FIG. 6, at decision block 610, it is determined whether useremotional state 412 is “Happy.” If no, the algorithm proceeds to block612, which sets emotion type code 240 b to “Neutral.” If yes, thealgorithm proceeds to decision block 620.

At decision block 620, it is further determined whether a personalityparameter 402.1 of user configuration 402 is “Extrovert.” If no, thenthe algorithm proceeds to block 622, which sets emotion type code 240 bto “Interested(1),” denoting an emotion type of “Interested” with degreeof 1. If yes, the algorithm proceeds to block 630, which sets emotiontype code 240 b to “Happy(3).”

It will be appreciated that rules-based algorithm 600 selectively setsthe emotion type code 240 b based on user personality, under theassumption that an extroverted user will be more engaged by a dialogsystem exhibiting a more upbeat or “happier” emotion type. Rules-basedalgorithm 600 further sets emotion type code 240 b based on current useremotional state, under the assumption that a currently happy user willrespond more positively to a system having an emotion type that is alsohappy. In alternative exemplary embodiments, other rules-basedalgorithms not explicitly described herein may readily be designed torelate emotion type code 240 b to other parameters and values of fact orprofile inputs 301.1.

As illustrated by algorithm 600, the determination of emotion type code240 b need not always utilize all available parameters in fact orprofile inputs 301.1 and user dialog input 230 a. In particular,algorithm 600 utilizes only user emotional state 412 and userconfiguration 402. Such exemplary embodiments of algorithms utilizingany subset of available parameters, as well as alternative exemplaryembodiments of algorithms utilizing parameters not explicitly describedherein, are contemplated to be within the scope of the presentdisclosure.

FIG. 7 illustrates an alternative exemplary embodiment 700 of arules-based algorithm. In FIG. 7, at decision block 710, it isdetermined whether user dialog input 230 a corresponds to a query by theuser for updated news. If yes, then the algorithm proceeds to decisionblock 720.

At decision block 720, it is determined whether user emotional state 412is “Happy,” and further whether online information resources 416indicate that the user's favorite sports team has just won a game. In anexemplary embodiment, the user's favorite sports team may itself bederived from other parameters of fact or profile inputs 301.1, e.g.,from user configuration 402, user online activity 404,calendar/scheduling functions 410, etc. If the output of decision block720 is yes, then the algorithm proceeds to block 730, wherein emotiontype code 240 b is set to “Excited(3).”

In addition to rules-based algorithms for selecting emotion type code240 b, emotion type classification algorithm 450.1 may alternatively orin conjunction utilize trained algorithms. FIG. 8 illustrates anexemplary embodiment 800 of a training scheme for deriving a trainedalgorithm for selecting emotion type. Note FIG. 8 is shown forillustrative purposes only, and is not meant to limit the scope of thepresent disclosure to any particular techniques for training algorithmsfor selecting emotion type.

In FIG. 8, during a training phase 801, an algorithm training block 810is provided with inputs including a series or plurality of referencefact or profile inputs 301.1*, a corresponding series of referenceprevious user inputs 230 a*, and a corresponding series of referenceemotion type codes 240.1 b*. Note a parameter x enclosed in braces {x}herein denotes a plurality or series of the objects x. In particular,each reference fact or profile input 301.1* corresponds to a specificcombination of settings for fact or profile inputs 301.1.

For example, one exemplary reference fact or profile input 301.1* mayspecify user configuration 402 to include an “extroverted” personalitytype, user online activity 404 to include multiple instances of onlinesearches for the phrase “Seahawks,” user location 408 to correspond to“Seattle” as a city of residence, etc. Corresponding to this referencefact or profile input 301.1*, a reference user dialog input 230 a* mayinclude a user query regarding latest sports news. In an alternativeinstance, the reference user dialog input 230 a* corresponding to thisreference fact or profile input 301.1* may be a NULL string, indicatingno previous user input. Based on this exemplary combination of referencefact or profile input 301.1* and corresponding reference user dialoginput 230 a*, a reference emotion type code 240.1 b* may be specified toalgorithm training block 810 during a training phase 801.

In an exemplary embodiment, the appropriate reference emotion type code240.1 b* for particular settings of reference fact or profile input301.1* and user dialog input 230 a* may be supplied by human annotatorsor judges. These human annotators may be presented with individualcombinations of reference fact or profile inputs and reference userinputs during training phase 801, and may annotate each combination witha suitable emotion type responsive to the situation. This process may berepeated using many human annotators and many combinations of referencefact or profile inputs and previous user inputs, such that a large bodyof training data is available for algorithm training block 810. Based onthe training data and reference emotion type annotations, an optimal setof trained algorithm parameters 810 a may be derived for a trainedalgorithm that most accurately maps a given combination of referenceinputs to a reference output.

In an exemplary embodiment, a human annotator may possess certaincharacteristics that are similar or identical to correspondingcharacteristics of a personality of a digital assistant. For example, ahuman annotator may have the same gender or personality type as theconfigured characteristics of the digital assistant as designated by,e.g., system personality 255 and/or digital assistant personality 418.

Algorithm training block 810 is configured to, in response to themultiple supplied instances of reference fact or profile input 301.1*,user dialog input 230 a*, and reference emotion type code 240.1 b*,derive a set of algorithm parameters, e.g., weights, structures,coefficients, etc., that optimally map each combination of inputs to thesupplied reference emotion type. In an exemplary embodiment, techniquesmay be utilized from machine learning, e.g., supervised learning, thatoptimally derive a general rule for mapping inputs to outputs. Algorithmtraining block 810 accordingly generates an optimal set of trainedalgorithm parameters 810 a, which is provided to an exemplary embodiment514.1 of trained algorithm block 514, such as shown in FIG. 5. Inparticular, block 514.1 selects emotion type 240.1 b during real-timeoperation 802 according to trained algorithm parameters 810 a.

Further provided hereinbelow is an illustrative description of anexemplary application of techniques of the present disclosure. Note theexample is given for illustrative purposes only, and is not meant tolimit the scope of the present disclosure to any particular sets ortypes of fact or profile inputs, system responses, or scenarios.

Mark is a football fan. He always pays attention to news about theNational Football League (NFL). Being a resident of Seattle, hisfavorite team is the Seattle Seahawks. Every Sunday, Mark watchesfootball games online using his smartphone, and discusses players andteams with his friends through online chatting applications. He alsoshares his activities and interest on social media applications. A fewmonths ago, when the Seahawks beat the 49ers in overtime, he was veryexcited and discussed the win extensively on his social media profilepage.

On a given Monday, the Seahawks are playing the 49ers in San Franciscoon Monday Night Football. Unfortunately, Mark had a dinner with aclient, and missed the game. The dinner was an important meeting forMark, as he was about to close a business deal. It turned out that themeeting went very well, and Mark's proposal was accepted by the clients.After dinner, on Mark's drive home, he activated the interactive dialogsystem (or digital assistant DA) on his smartphone to engage in areal-time voice conversation.

Mark: “Hey, it was a great dinner! I sealed the deal. I think Don (Marksboss) is going to love this.”

Digital Assistant (in an upbeat tone of voice): “Excellent! Good tohear.” (DA Response 1)

Mark: “So, what about those Seahawks? Tell me!”

Digital Assistant (in an excited tone of voice): “Guess what, yourSeahawks won! They beat the 49ers 30 to 25. Russell Wilson threw twotouchdown passes in the fourth quarter.” (DA Response 2)

Mark: “Wow, that's great. I am sorry I missed this game. I think theywill be in the playoffs again this year!”

Digital Assistant (continuing in an excited voice, slightly moresubdued): “Yes! I should just block off your calendar during theplayoffs. I don't think you'd want to miss that!” (DA Response 3)

The preceding example illustrates several aspects of techniques of thepresent disclosure. In particular, the interactive dialog system knowsthat Mark is a football fan, and also a Seahawks fan. It obtains thisinformation from, e.g., explicit settings configured by Mark on hisdigital assistant, indicating that Mark wants to track football news,and also that his favorite team is the Seahawks. From online informationsources, the DA is also aware that the Seahawks played that nightagainst their rival team, the San Francisco 49ers, and that the Seahawksbeat them from behind. This enables the DA to select an emotion typecorresponding to an excited tone of voice (DA Response 2) when reportingnews of the Seahawks' win to Mark. Furthermore, based on knowledge ofMark's preferences and his previous input, the DA selects an excitedtone of voice when offering to block off time for Mark in his calendar(DA Response 3).

The dialog system further has information regarding Mark's personality,as derived from, e.g., Mark's usage pattern of his smartphone (e.g.,frequency of usage, time of usage, etc.), personal interests and hobbiesas indicated by Mark during set up of his smartphone, as well as statusupdates to his social media network. In this example, the dialog systemmay determine that Mark is an extrovert and a conscientious person basedon machine learning algorithms designed to deal with a large number ofstatistics generated by Mark's usage pattern of his phone to inferMark's personality.

Further information is derived from the fact that Mark activated the DAsystem over two months ago, and that he has since been using the DAregularly and with increasing frequency. In the last week, Markinteracted with the DA an average of 5 times per day. In an exemplaryembodiment, certain emotion type classification algorithms may infer anincreasing intimacy between Mark and the DA due to such frequency ofinteraction.

The DA further determines Mark's current emotional state to be happyfrom his voice. From his use of the calendar/scheduling function on thedevice, the DA knows that it is after working hours, and that Mark hasjust finished a meeting with his client. During the interaction, the DAidentifies that Mark is in his car, e.g., from the establishment of awireless Bluetooth connection with the car's electronics, intervals ofbeing stationary following intervals of walking as determined by anaccelerometer, the lower level of background noise inside a car, themeasured velocity of movement, etc. Furthermore, from past data such aslocation data history matched to time-of-day statistics, etc., it issurmised that Mark is driving home after dinner. Accordingly, per aclassification algorithm such as described with reference to block 450.1in FIG. 4, the DA selects an emotion type corresponding to an upbeattone of voice (DA Response 1).

FIG. 9 illustrates an exemplary embodiment of a method 900 according tothe present disclosure. Note FIG. 9 is shown for illustrative purposesonly, and is not meant to limit the scope of the present disclosure toany particular method shown.

In FIG. 9, at block 910, the method includes selecting, based on atleast one fact or profile input, an emotion type code associated with anoutput statement, the emotion type code specifying one of a plurality ofpredetermined emotion types.

At block 920, the method includes generating speech corresponding to theoutput statement, the speech generated to have the predetermined emotiontype specified by the emotion type code. In an exemplary embodiment, theat least one fact or profile input is derived from usage of a mobilecommunications device implementing an interactive dialog system.

FIG. 10 schematically shows a non-limiting computing system 1000 thatmay perform one or more of the above described methods and processes.Computing system 1000 is shown in simplified form. It is to beunderstood that virtually any computer architecture may be used withoutdeparting from the scope of this disclosure. In different embodiments,computing system 1000 may take the form of a mainframe computer, servercomputer, cloud computing system, desktop computer, laptop computer,tablet computer, home entertainment computer, network computing device,mobile computing device, mobile communication device, smartphone, gamingdevice, etc.

Computing system 1000 includes a processor 1010 and a memory 1020.Computing system 1000 may optionally include a display subsystem,communication subsystem, sensor subsystem, camera subsystem, and/orother components not shown in FIG. 10. Computing system 1000 may alsooptionally include user input devices such as keyboards, mice, gamecontrollers, cameras, microphones, and/or touch screens, for example.

Processor 1010 may include one or more physical devices configured toexecute one or more instructions. For example, the processor may beconfigured to execute one or more instructions that are part of one ormore applications, services, programs, routines, libraries, objects,components, data structures, or other logical constructs. Suchinstructions may be implemented to perform a task, implement a datatype, transform the state of one or more devices, or otherwise arrive ata desired result.

The processor may include one or more processors that are configured toexecute software instructions. Additionally or alternatively, theprocessor may include one or more hardware or firmware logic machinesconfigured to execute hardware or firmware instructions. Processors ofthe processor may be single core or multicore, and the programs executedthereon may be configured for parallel or distributed processing. Theprocessor may optionally include individual components that aredistributed throughout two or more devices, which may be remotelylocated and/or configured for coordinated processing. One or moreaspects of the processor may be virtualized and executed by remotelyaccessible networked computing devices configured in a cloud computingconfiguration.

Memory 1020 may include one or more physical devices configured to holddata and/or instructions executable by the processor to implement themethods and processes described herein. When such methods and processesare implemented, the state of memory 1020 may be transformed (e.g., tohold different data).

Memory 1020 may include removable media and/or built-in devices. Memory1020 may include optical memory devices (e.g., CD, DVD, HD-DVD, Blu-RayDisc, etc.), semiconductor memory devices (e.g., RAM, EPROM, EEPROM,etc.) and/or magnetic memory devices (e.g., hard disk drive, floppy diskdrive, tape drive, MRAM, etc.), among others. Memory 1020 may includedevices with one or more of the following characteristics: volatile,nonvolatile, dynamic, static, read/write, read-only, random access,sequential access, location addressable, file addressable, and contentaddressable. In some embodiments, processor 1010 and memory 1020 may beintegrated into one or more common devices, such as an applicationspecific integrated circuit or a system on a chip.

Memory 1020 may also take the form of removable computer-readablestorage media, which may be used to store and/or transfer data and/orinstructions executable to implement the herein described methods andprocesses. Memory 1020 may take the form of CDs, DVDs, HD-DVDs, Blu-RayDiscs, EEPROMs, and/or floppy disks, among others.

It is to be appreciated that memory 1020 includes one or more physicaldevices that stores information. The terms “module,” “program,” and“engine” may be used to describe an aspect of computing system 1000 thatis implemented to perform one or more particular functions. In somecases, such a module, program, or engine may be instantiated viaprocessor 1010 executing instructions held by memory 1020. It is to beunderstood that different modules, programs, and/or engines may beinstantiated from the same application, service, code block, object,library, routine, API, function, etc. Likewise, the same module,program, and/or engine may be instantiated by different applications,services, code blocks, objects, routines, APIs, functions, etc. Theterms “module,” “program,” and “engine” are meant to encompassindividual or groups of executable files, data files, libraries,drivers, scripts, database records, etc.

In an aspect, computing system 1000 may correspond to a computing deviceincluding a memory 1020 holding instructions executable by a processor1010 to select, based on at least one fact or profile input, an emotiontype code associated with an output statement, the emotion type codespecifying one of a plurality of predetermined emotion types. Theinstructions are further executable by processor 1010 to generate speechcorresponding to the output statement, the speech generated to have thepredetermined emotion type specified by the emotion type code. In anexemplary embodiment, the at least one fact or profile input is derivedfrom usage of a mobile communications device implementing an interactivedialog system. Note such a computing device will be understood tocorrespond to a process, machine, manufacture, or composition of matter.

FIG. 11 illustrates an exemplary embodiment of an apparatus 1100according to the present disclosure. Note the apparatus 1100 is shownfor illustrative purposes only, and is not meant to limit the scope ofthe present disclosure to any particular apparatus shown.

In FIG. 11, a classification block 1120 is configured to select, basedon at least one fact or profile input 1120 b, an emotion type code 1120a associated with an output statement 1110 a. The emotion type code 1120a specifies one of a plurality of predetermined emotion types. Atext-to-speech block 1130 is configured to generate speech 1130 acorresponding to the output statement 1110 a and the predeterminedemotion type specified by the emotion type code 1120 a. In an exemplaryembodiment, the at least one fact or profile input 1120 b is derivedfrom usage of a mobile communications device implementing theinteractive dialog system.

Note techniques of the present disclosure need not be limited toembodiments incorporating a mobile communications device. In alternativeexemplary embodiments, the present techniques may also be incorporatedin non-mobile devices, e.g., desktop computers, home gaming systems,etc. Furthermore, mobile communications devices incorporating thepresent techniques need not be limited to smartphones, and may alsoinclude wearable devices such as computerized wristwatches, eyeglasses,etc. Such alternative exemplary embodiments are contemplated to bewithin the scope of the present disclosure.

FIG. 12 illustrates an exemplary embodiment 1200 wherein techniques ofthe present disclosure are incorporated in a dialog system withemotional content imparted to displayed text, rather than or in additionto audible speech. Note blocks shown in FIG. 12 correspond to similarlylabeled blocks in FIG. 2, and certain blocks shown in FIG. 2 are omittedfrom FIG. 12 for ease of illustration.

In FIG. 12, output 250 a of language generation block 250 is combinedwith emotion type code 240 b generated by dialog engine 240 and input toa text to speech and/or text for display block 1260. In a text to speechaspect, block 1260 generates speech with semantic content 240 a andemotion type code 240 b. In a text for display aspect, block 1260alternatively or further generates text for display with semanticcontent 240 a and emotion type code 240 b. It will be appreciated thatemotion type code 240 b may impart emotion to displayed text using suchtechniques as, e.g., adjusting the size or font of displayed textcharacters, providing emoticons (e.g., smiley faces or other pictures)corresponding to the emotion type code 240 b, etc. In an exemplaryembodiment, block 1260 alternatively or further generates emotion-basedanimation or graphical modifications to one or more avatars representingthe DA or user on a display. For example, if emotion type code 240 bcorresponds to “sadness,” then a pre-selected avatar representing the DAmay be generated with a pre-configured “sad” facial expression, orotherwise be animated to express sadness through motion, e.g., weepingactions. Such alternative exemplary embodiments are contemplated to bewithin the scope of the present disclosure.

In this specification and in the claims, it will be understood that whenan element is referred to as being “connected to” or “coupled to”another element, it can be directly connected or coupled to the otherelement or intervening elements may be present. In contrast, when anelement is referred to as being “directly connected to” or “directlycoupled to” another element, there are no intervening elements present.Furthermore, when an element is referred to as being “electricallycoupled” to another element, it denotes that a path of low resistance ispresent between such elements, while when an element is referred to asbeing simply “coupled” to another element, there may or may not be apath of low resistance between such elements.

The functionality described herein can be performed, at least in part,by one or more hardware and/or software logic components. For example,and without limitation, illustrative types of hardware logic componentsthat can be used include Field-programmable Gate Arrays (FPGAs),Program-specific Integrated Circuits (ASICs), Program-specific StandardProducts (ASSPs), System-on-a-chip systems (SOCs), Complex ProgrammableLogic Devices (CPLDs), etc.

While the invention is susceptible to various modifications andalternative constructions, certain illustrated embodiments thereof areshown in the drawings and have been described above in detail. It shouldbe understood, however, that there is no intention to limit theinvention to the specific forms disclosed, but on the contrary, theintention is to cover all modifications, alternative constructions, andequivalents falling within the spirit and scope of the invention.

1. An apparatus for an interactive dialog system, the apparatuscomprising: a classification block configured to select, based on atleast one fact or profile input, an emotion type code associated with anoutput statement, the emotion type code specifying one of a plurality ofpredetermined emotion types; and a text-to-speech block configured togenerate speech corresponding to the output statement, the speechgenerated to have the predetermined emotion type specified by theemotion type code; wherein the at least one fact or profile input isderived from usage of a mobile communications device implementing theinteractive dialog system.
 2. The apparatus of claim 1, the mobilecommunications device configured to provide voice calling and Internetaccess services, the apparatus further comprising a language generationblock configured to generate the output statement in a natural language,the output statement having a predetermined semantic content and aspecified predetermined emotion type associated with the emotion typecode.
 3. The apparatus of claim 1, the at least one fact or profileinput comprising at least one user configuration parameter configured bythe user.
 4. The apparatus of claim 3, the at least one userconfiguration parameter comprising at least one of hobbies, interests,personality traits, favorite movies, favorite sports, and favorite typesof cuisine.
 5. The apparatus of claim 3, the at least one fact orprofile input further comprising at least one parameter derived fromuser online activity using the apparatus.
 6. The apparatus of claim 5,the at least one parameter derived from user online activity comprisingat least one of Internet search queries, accessed Internet websites,contents of e-mail messages, and postings to online social mediawebsites.
 7. The apparatus of claim 3, the at least one fact or profileinput further comprising at least one of user location, contents of usertext or voice communications, and at least one event scheduled by theuser using a calendar scheduling function of the apparatus.
 8. Theapparatus of claim 3, the at least one fact or profile input furthercomprising at least one of a current user emotional state, device usagestatistics, online information resources, and a digital assistantpersonality.
 9. The apparatus of claim 2, the classification blockfurther configured to select the emotion type code based on user dialoginput.
 10. The apparatus of claim 2, further comprising a text fordisplay block generating displayed text corresponding to the outputstatement in the natural language.
 11. The apparatus of claim 10, thenatural language being English.
 12. The apparatus of claim 3, theclassification block configured to select the emotion type code using analgorithm comprising at least one functional mapping between a pluralityof reference fact or profile inputs and a corresponding plurality ofreference emotion types, the at least one functional mapping beingderived from machine learning techniques.
 13. A computing deviceincluding a processor and a memory holding instructions executable bythe processor to: select, based on at least one fact or profile input,an emotion type code associated with an output statement, the emotiontype code specifying one of a plurality of predetermined emotion types;and generate speech corresponding to the output statement, the speechgenerated to have the predetermined emotion type specified by theemotion type code; wherein the at least one fact or profile input isderived from usage of a mobile communications device implementing aninteractive dialog system.
 14. The computing device of claim 13, thecomputing device comprising a smartphone configured to provide voicecalling and Internet access services.
 15. The computing device of claim14, the at least one fact or profile input further comprising at leastone of a parameter derived from user online activity using thesmartphone, user location, contents of user text or voicecommunications, and at least one event scheduled by the user using acalendar scheduling function of the device.
 16. The computing device ofclaim 14, the at least one fact or profile input further comprising atleast one of a current user emotional state, device usage statistics,online information resources, and a digital assistant personality.
 17. Amethod comprising: selecting, based on at least one fact or profileinput, an emotion type code associated with an output statement, theemotion type code specifying one of a plurality of predetermined emotiontypes; and generating speech corresponding to the output statement, thespeech generated to have the predetermined emotion type specified by theemotion type code; wherein the at least one fact or profile input isderived from usage of a mobile communications device implementing aninteractive dialog system.
 18. The method of claim 17, the at least onefact or profile input comprising user location.
 19. The method of claim18, the at least one fact or profile input further comprising at leastone of a user configuration parameter configured by the user, useronline activity, user location, contents of user text or voicecommunications, and at least one event scheduled by the user using acalendar scheduling function.
 20. The method of claim 18, the at leastone fact or profile input further comprising at least one of a currentuser emotional state, device usage statistics, online informationresources, and a digital assistant personality